Device, method and computer program product for generating web feeds

ABSTRACT

A system for dynamically defining a web feed includes a memory unit adapted to store web feed data and to generate a web feed of selected web content. The system includes an input processor to receive a user input defining one or more remote websites and to retrieve remote web content from the one or more remote websites. A user interface is provided to display a set of identified elements from the remote web content in a display area of a primary website and a selection processor receives a user selection identifying one or more selected elements of the remote web content. An equivalency engine calculates equivalency classes including subsets of the identified elements determined to be structurally similar to the selected elements. A web feed is generated and displayed to the user on the primary website that includes at least the selected elements and one or more of the subsets of the identified elements determined to be structurally similar to the selected elements.

PRIORITY CLAIM

This application is a continuation of U.S. patent application Ser. No.14/191,113, filed Feb. 26, 2014, now issued as U.S. Pat. No. 9,448,983,which is a continuation of U.S. patent application Ser. No. 11/868,981,filed Oct. 9, 2007, now issued as U.S. Pat. No. 8,706,757, which claimspriority to U.S. Provisional Patent Application 60/901,115, filed Feb.14, 2007, each of which are incorporated by reference herein.

TECHNICAL FIELD

The present invention relates to a method, a device and a computerprogram product for generating feeds.

BACKGROUND

Web feeds (also referred to as feeds or channels) are data formats usedfor serving users frequently updated content. A web feed can includemultiple items. U.S Patent Application Pub. No. 2006/0288329 of Gandhiet al., which is incorporated herein by reference, illustrates a contentsyndication platform.

Content distributors syndicate a web feed, thereby allowing users tosubscribe to it, accordingly only content that is included in apredefined web feed can be syndicated. Content distributors sometimesalso define a programmatic interface to their content (also known as anAPI), which allows programmatic access to the content.

There is a growing need to provide a more flexible and yet simplesystem, method and computer program product for defining distributablecontent from any web source, not just those that have a predefined feedor API. For example, this is a key requirements in the creation of “webmashups” (programmatic combination multiple web sites and other datasources) existence of feeds and APIs.

SUMMARY

A method for generating a feed, the method includes: receiving selectioninformation representative of a selection of a selected element out ofmultiple elements of a web content; and generating an equivalentindication representative of at least one equivalent element that issimilar to the selected elements.

A system for dynamically defining a web feed is provided includingmemory unit adapted to store web feed data and to generate a web feed ofselected web content. The system includes an input processor to receivea user input defining one or more remote websites and to retrieve remoteweb content from the one or more remote websites. The system includes auser interface to display a set of identified elements from the remoteweb content in a display area of a primary website and a selectionprocessor to receive a user selection identifying one or more selectedelements of the remote web content. An equivalency engine calculatesequivalency classes including subsets of the identified elementsdetermined to be structurally similar to the selected elements. A webfeed is generated and displayed to the user on the primary website thatincludes at least the selected elements and one or more of the subsetsof the identified elements determined to be structurally similar to theselected elements.

A non-transitory computer readable storage medium having stored thereindata representing instructions executable by a programmed processor fordynamically defining a web feed, is provided to receive a sample set ofremote webpages, and to extract content from the remote webpages toproduce a set of identified elements. The set of identified elements aredisplayed in a display area of a primary website, and the structuralsimilarities of the identified elements are determined. Associated keysare assigned to each identified element describing a structuralcharacteristic of the identified element. A subset of the identifiedelements determined to be structurally similar based at least on theassociated keys are grouped in equivalency classes. A user selection isreceived identifying one or more selected elements from the set ofidentified elements displayed on the primary website, and a web feed isgenerated and displayed to the user in the display area of the primarywebsite including the selected elements and the subset of identifiedelements.

Other systems, methods, features and advantages will be, or will become,apparent to one with skill in the art upon examination of the followingfigures and detailed description. It is intended that all suchadditional systems, methods, features and advantages be included withinthis description, be within the scope of the invention, and be protectedby the following claims and be defined by the following claims. Nothingin this section should be taken as a limitation on those claims. Furtheraspects and advantages are discussed below in conjunction with thepreferred embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive embodiments are described with referenceto the following drawings. In the drawings, like reference numeralsrefer to like parts throughout the various figures unless otherwisespecified.

FIG. 1 illustrates a method for generating a web feed, according to anembodiment of the invention and also initial uses of such feed creationtechniques.

FIG. 2 illustrates a method for distributing a web feed, according to anembodiment of the invention.

FIG. 3 illustrates a system according to an embodiment of the invention.

FIG. 4 illustrates a screen displayed to a user, according to anembodiment of the invention.

DETAILED DESCRIPTION

The term “feed” as used herein below refers to preconfigured feeds suchas RS S feeds, and dynamic feeds that require input before providing thefeed also known as APIs.

The term “web content” as used herein below refers to content accessibleover the Internet. It may include a web page, a portion o f a web page,information that is included in a web page, and the like.

The method, computer program product and system f or generating anddistributing web feeds.

The system maintains a primary website that can be accessed by usersthat wish to generate web feeds, update web feeds or delete web feeds.Once a user browses to the primary website the system enables the userto define a new web feed or update an existing web feed by using agraphical interface.

A web feed is created once and can then be distributed repeatedly (oraccording to predefined schedule) in order to obtain desired contentfrom a remote website associated therewith in a structured format. Inthis sense, the method, computer program product and system are a visualapplication programming interface (API) creator for any website, withoutthe need for programming.

The content of a remote website is retrieved by the web feed while beingformatted in a structured format. This content can be used by the systemto manipulate, transform, and use the content. The content can beprocessed by programs that can be accessed via the primary web site butthis is not necessarily so. For example, users can create their ownprograms in any programming language that use the output of anyavailable web feed.

The web feed creation process entails browsing the remote website insidethe primary website, defining which portions (elements) of the contentof the remote website will be desired in the future, and assigningsemantic meanings (in the form of a name and groupings) to these contentelements (e.g. “Article Author”). When the web feed is distributed (forexample—when the user requests content via the web feed), the systemextracts the desired content from the remote website as it exists atthat point in time, and then names each piece of content using thesemantic definitions that the user originally supplied. As a result ofthe aggregate assignment of semantic information to portions ofwebsites, the system builds, over time, a semantic understanding of theweb.

In order to define a web feed, users go through an interactive visualprocess. This process entails supplying the system with one or moreexemplary pages on a remote website which include content of interest.The exemplary pages conveniently have the same layout and structure, butdifferent content (e.g. the results of several searches on a searchengine). Furthermore, the user can choose input variables (e.g. “searchterm” or login information) through a visual interface on the primarywebsite.

Once the user has submitted the exemplary pages on the remote website,the system performs an algorithmic analysis of these exemplary pages.This analysis identifies similarities between the different exemplarypages and between elements within each exemplary page. The result isprogrammatic understanding of the pages' structures which gets storedand is used as the basis for the new web feed.

The user selects desired content by clicking on various elements in theexemplary pages and assigns a name to each content type (e.g. “SearchResult Title”). The user can also define and name relationships betweencontent types, including grouping several content types together (e.g.“Search Result Title” and “Search Result Summary” belong to a “SearchResult”). This process is entirely visual and point-and-click, therebyallowing a user to construct a sophisticated and powerful API with noprogramming.

The web feed is encoded within the system as a between the names (andadditionally or alternatively tags or attributes) the user supplied andthe technical information necessary to extract the relevant content fromany instance of similar pages on the website. In addition to this, theweb feed may be described by additional metadata that the user suppliessuch as data type (string, integer etc.), length and other attributes.In addition the user may define for each content type (field) a certainconstrain or post processing rule (e.g. regular expression which removesstrings matching “X” or a delimiter string that dissects the contentinto multiple instances).

According to an embodiment of the invention the web feed can becomereadily available for users of the system. The individual contributionsof each user construct a comprehensive database that enables a completecoverage of the web in the form of semantic understanding andprogrammatic interaction with websites.

In order to interact with any web feed, a user uses or creates softwarethat communicates with the system over the web using a URL. As such, theoutput of any web feed is available on the web at a specific address.The URL provides the mechanism for supplying variable inputs andrequesting the desired output, as well as the means for passing therelevant contents on which to run various algorithms that can be appliedby the system.

Upon receiving any such request, the system runs its algorithms on thecontent of the remote website as it exists at that moment in time, andcompares its results to the mapping stored in the web feed. Using thismethod, the system extracts the relevant content and names the contentpieces using the mapping defined by the user during the creation phase.The system returns the named content in any of several formats,including XML.

In addition to user-created web feeds, the system is capable of theautomatic creation of new web feeds and modification of existing webfeeds. By automatically examining similarities between web contents, thesystem is able to harness the information of an existing web feed toimprove upon it or create a new web feed. This provides the system withan ever-growing coverage of the web which is not restricted by the needfor user interaction.

The system interfaces between content owners or content providers andcontent consumers. According to an embodiment of the invention, thesystem can enforce content usage and/or content access limitationsimposed by the content owners or content providers. For example, contentproviders can specify preferences and terms of use for their contentusing a web-based interface. Content consumers are able to registertheir needs and agree to the terms set forth by content providers. Thisembodiment provides both the technological means to access the contentprovider's content as well as the legal legitimacy required to do so.The content exchange platform allows for various forms of compensationto the content owner from the content consumer, including financial,link-driven traffic, and brand exposure. The system can automaticallycreate, enforce, and execute a business agreement between the twoparties.

According to various embodiments of the invention semantic informationcaptured by the system in the web feed creation phase and throughautomatic web-service creation can be leveraged in a variety of ways.The system semantic understanding of websites can be used to enablesearching the web using semantic information (e.g. “find all pages thatcontain recipes with less than 200 calories.”) In doing so, the systemempowers existing information retrieval tools that treat the web as astructured dataset to locate and retrieve information in a more powerfuland precise manner.

According to yet further embodiments of the invention this semanticunderstanding can be applied to advertising networks to better match andtarget advertisements to content. By using the method, system andcomputer program product to create a feed and semantically describetheir site, website owners can place semantic based advertisements thatdirectly relate to an understanding of the content, as opposed totraditional methods of keyword matching (e.g. “provide a link to mysupermarket checkout with a pre-filled shopping cart whenever a list ofingredients exists.”). For advertising or affiliate network feeds thathave an API website owners can also use this semantic information toprogrammatically select between existing affiliate network feeds tochoose both the most appropriate merchant and dynamically display themost appropriate merchant products for each page on their site. If themerchant, or advertiser, does not have an appropriate web feed, thisinvention can be used to easily create such a web feed.

The system and especially web feed related information can be utilizedby applications, services, websites, and devices that reuse content fromthe web. This content as either owned by the content user or by a thirdparty.

Conveniently, equivalency engine 430 is being used in the two processesof creating a new web feed and running an existing web feed over a page.However, its input and output is different. At the creation process, theequivalency engine receives a sample of pages upon which it runs andexecutes all of the ECMs. The result of this process is the assignmentof a set of keys (per ECM) for every element in the page. This mappingbetween keys and elements is then used in the GUI of the system. Asdescribed above, the user chooses an element in the GUI. When a userchooses an element she, in effect, chooses a (possibly complete) sub-setof keys associated with the element, and can add additionalcharacteristics such as pattern matching constraints. The user isprovided with visual feedback from the system as it singles out theother elements in the page that share keys with the element chosen. Soto further the example described above, if the user clicked on anelement whose tag is a link (<a href=“X”></a>), all of the other linksin the page will be highlighted, representing all elements that are alsolinks. Through the GUL the user can then define intuitively the set ofkeys (e.g. key that relates to the ECM matching element tags) and thevalue of the key (<a> tag) she'd like as output of the web feed shedefines.

The web feed distribution process is conducted after the web feed hasbeen defined, and the subsets of keys and their respective values havebeen defined by the user, along with any other characteristics she sawfit. The equivalency engine performs a different task in this mode. Itgets as input the page to run on, along with the chosen set of keys,their values and other characteristics. Then the Runner runs over theprovided page, executing the ECMs and producing values for all of thekeys. As it runs over the page and produces keys, it checks against theuploaded web feed definition, looking for matching elements and contentthat got key values equal to the ones stored in the WS. If there is amatch and the other characteristics are met, the element content isadded to the structured content output. At the end of the process, theequivalency engine outputs the structured content aggregated during thepass over the page.

There are two modes, one for regular users and one for power users.Regular users don't select keys directly, but rather either by clickingdirectly on the content, or by clicking on various controls. Theregular, direct click algorithm takes into account the prior state ofthe element that receives the click, the tagset on the element, andother elements that have already been selected. The algorithm analyzesthis information and then modifies the set of selected itemsappropriately—while trying to minimize the changes to selection that theuser made (as opposed to those selected algorithmically). The regularinterface may also provide various controls, for example: (i) A tablecontrol that give the user control to select table specific artifactsfrom a page (e.g. a column or row); (ii) An isolation control whichallows the user to limit the scope of search for equivalent elements;(iii) A sensitivity control which gives the user control over the numberand type of keys used for equivalence; (iv) A regular expression controlwhich allows the user to select sub-parts of an elements content.

A power mode can be combinations directly. In this mode through the GUIof the various sets can select sets of keys directly.

FIG. 1 illustrates method 100 for generating a web feed according to anembodiment of the invention.

Method 100 stares by stage 110 of displaying a graphical interface to auser that browses to a primary website. Such a web site can bewww.dapper.org, but this is not necessarily so.

According to an embodiment of the invention the graphical interfaceincludes a window for inserting a remote web page locator such as aUniform Resource Locator. The graphical interface also includes variousmechanisms to allow a user to browse in order to find the appropriatepage.

Stage 110 is followed by stage 120 of receiving one or more web pages ofa remote web site from a user that browses to the primary web site. Theweb pages can be received one at a time, after being selected by theuser. The selection utilizes the graphical interface.

Stage 120 can include: (i) stage 122 of receiving browsing informationsuch as indicator information and browsing to the remote web site thatis identified by the locator, (ii) stage 124 of enforcing access and/orusage policies or rules of the remote web site, (iii) stage 126 ofdisplaying a web page of the remote web in response to input provided bythe user, and the like.

It is noted that stage 124 of enforcing can include preventing a user toaccess the remote web site, preventing the user from accessing a certainweb page of the remote web site, preventing the user from downloadingcertain content, conditioning the access to content or retrieval ofcontent, and the like. The conditioning can include limiting the numberof accesses of the user per time period, require the user to pass one ormore tests (such as inserting text representative of a wrapped image),can require the user to pay for access of for certain information, andthe like. In this sense the primary web site enables the remote web siteto enforce its access and/or usage policies.

It is noted that these access and/or usage policies can be applied eachtime the remote web site is accesses or content is retrieved during adistribution of a web feed that includes content from that remote website.

Stage 120 can involve multiple repetitions of either one of stage122-126 such as to provide one or more web pages for analysis.

Stage 120 is followed by stage 140 of calculating equivalent classes,each equivalent class includes web content representation elements thatare mutually equivalent.

According to an embodiment of the invention stage 140 includescalculating equivalent classes by an equivalency engine (also referredto as equivalency engine or core engine). At the end of stage 120 theequivalency engine can receive one or more sample web pages or URLs thatform a sample set. The sample set can be of any size, from a single pageupwards.

According to various embodiments of the invention if the sample setincludes multiple web pages than the equivalency engine candifferentiate between static elements (static content) and the dynamicelements (dynamic content) within the sample set. The differentiatingcan include ignoring dynamic elements. Static content is defined ascontent that repeats on any or many of the samples while dynamic contentincludes content that changes from page to page. For example, if threesamples of different search results from a search engine will allconsider the logo of the search engine to be static content, but willcontain different results, unique to each page, which will be considereddynamic.

According to various embodiments of the invention stage 140 can includeat least one of the following or a combination thereof: (i) calculatingat least one key for each element; (ii) storing the at least one key perelement; (iii) calculating multiple keys of different abstraction levelper element; (iv) choosing to store a subset of the associated keys,thus defining the strictness and looseness of the field definition; (v)calculating equivalent classes in response to structural characteristicsof the elements; (vi) determining an equivalency of a first element anda second element in response to a characteristic of equivalent elementsof the first element and a characteristic of the second element; (vii)calculating equivalent classes in response to previous elections of theuser; (viii) calculating equivalent classes in response to elections ofanother user; (ix) calculating equivalent classes in response to anindication representative of an equivalency level of an equivalencyclass; (x) calculating equivalent classes in response to an indicationrepresentative of a scope of a search for equivalent element.

According to an embodiment of the invention stage 140 the calculating ofequivalency classes includes linking elements in a web contentrepresentation (such as in a document object module (DOM) representationof a web page or another semi-structured web format such as but notlimited to RSS). Equivalence can be defined as structural equivalenceand can be defined by one or more different heuristics.

Conveniently, keys generated during stage 120 allow for easyidentification of classes of web page sub-trees that have equivalentstructure. The keys can be assigned based upon key definitions that canbe updated over time. These keys are conveniently robust to changes on apage.

A sample equivalence class describes the “Most Complex Structure” (MCS)within a DOM representation of a web page which is the oldest (closestto the root of the DOM) but has a similar static sub-tree structure.

The MCS computation algorithm can use examples to differentiate betweenstatic and dynamic elements in a page (static elements do not change fordifferent instances of a page, while dynamic elements can change foreach instance of a page). For example if a user searched for the term“dapper” on Google™ (thus the web site www.google.com is the remote website then the search result will include multiple web pages that have asimilar structure (simplified for illustration purposes): (i)Title—Bolded version of the search term; (ii) Description—Bolded versionof the search term, URL, Size, “Cached: link, “Similar pages” link,“Note this” link, “More results” link. It is noted that dynamic elementsare in italics. They do not show up for every repeating structure.

Conveniently, stage 120 includes calculates keys that describe theelement and its relative structure within the DOM. These keys are usedto calculate similarity between different structures. Different types ofkeys can be used in order to compute different types of equivalences (orsimilarity).

A key is computed for each element which describes a structuralcharacteristic of the element. An element with multiple children thathave the same key, defines a Most Complex Structure (MCS) which is theoldest (closest to the root of the DOM) but has a similar staticsub-tree structure. For elements that have no such MCS the root tag(HTML) is considered the MCS ancestor. Each MCS defines an MCS elementkey which defines it as an MCS, and allows elements to easily be linkedto their MCS ancestor.

Conveniently, stage 120 includes generating an internal element key foreach element. This internal key element includes multiple attributessuch as: (i) HTML tag, (ii) static content of the element (whereapplicable); (iii) MCS ancestor (where applicable); and (iv) relativeoffset from the MCS ancestor (using a DFS numbering scheme—whereapplicable).

Conveniently, stage 120 includes generating a cousin key for eachelement. The cousin key includes: (i) a tag, (ii) an MCS Tag, (iii) anMCS Key, (iv) an absolute level (from the root of the tree), and (v)relative level from MCS ancestor.

According to an embodiment of the invention stage 120 further includesgenerating easily retrievable data structures representative of theequivalency classes. Samples of easily retrievable data structures caninclude: (i) a first list of all of an elements descendant tags; (ii) asecond list that includes an element's level, tag name, EKMC.

Stage 120 conveniently includes a heuristic determining that twoelements are equivalent if a ratio between the number of unique firstlist elements (that exist in only in one MCS) and the number of firstlist elements in their union is less than some constant.

Stage 120 can assign a unique key per each table element, another keyper each table row, a further key per each table column and yet afurther key for all the cells of the table.

Yet according to another embodiment of the invention stage 140 caninclude utilizing one or more equivalency class that was calculated inthe past. These equivalency classes can be calculated in relation to oneor other users, can be responsive to inputs of one or more other usersand the like. Thus, instead of calculating new equivalency classes stage120 can involve utilizing previously calculated equivalency classes.

Stage 140 is followed by stage 160 of receiving selection informationrepresentative of a selection of an element out of multiple elements ofa web content representation. The selection can be made by simplyclicking on a selected element of a web page of a remote web site thatis being displayed to the user.

Stage 160 is followed by stage 180 of generating an equivalentindication representative of at least one equivalent element that issimilar to the selected element. Stage 180 can include emphasizingequivalent elements of the displayed web page. The emphasis can includehighlighting equivalent elements, surrounding these equivalent elementsby a frame, or utilizing any known graphical technique.

Stage 180 is followed by stage 200 allowing a user to respond to thegeneration of the equivalent indication. The user can perform at leastone of the following or a combination thereof: (i) select at least oneequivalent element and optionally define its associated meta-data and/orsemantic information; (ii) de-select at least one equivalent element;(iii) de-select the selected element; (iv) elect a non-equivalentelement; (v) change at least one characteristic of the similarityalgorithm.

It is noted that multiple iterations of stage 180 can occur and the usercan provide an end of stage indication before method 100 continues tostage 240.

It is further noted that the response of the user can cause method 100to try to find a minimal equivalency class such as to include only theselected element and the equivalent elements that were selected (or notde-selected) by the user.

Stage 200 can include waiting for a certain period (that can be timelimited) but this is not necessarily so. The user can receive remindersthat urge him to either perform one or more of the mentioned aboveoperations) or to terminate the election stage. It is noted that theuser can perform multiple elections.

Stage 200 is followed by stage 240 of defining a web feed. The web feedwill include a selected element and can include one or more equivalentelements. The one or more equivalent elements can be selected by theuser (either by positively electing the equivalent element or by merelynot de-selecting an equivalent element).

Conveniently, the generation of a web feed also involves receiving andprocessing metadata such as but not limited to semantic content. Themetadata can include linking information that links between selecteditems, a user definition of one or more selected items, and the like.

For example, a user can define hierarchies by creating a group ofelements. This definition is received during stage 200. The user canchoose (and the method receives) any combination of elements andpreviously defined groups of elements to define a new group. Once agroup is defined (conveniently by the user) the method receives metadatathat reflects relationships between repeating instances of differentelements. For example, assume that during stage 120 multiple searchresult web pages are received. A typical search result web page includessearch results, each including a title with a link to the search resultand a summary of the page linked. After defining two fields: “title” and“summary”, the user can define a group named “search result” that willfacilitate an association of the first result title with the firstresult summary, the second result title with the second result summary,and so on, allowing an equivalency engine to return results in anhierarchically structured format. Optionally, the user can also definemore complex hierarchies such as groups inside groups and groups thatcontain both groups and fields. Once the user has finished defining thefields and groups, she can give the web feed a name and additionalmeta-data such as tags and description, save it and start using it.

Stage 200 is also followed by stage 250 of processing received metadata.The metadata (and information) relates to the selected element and toequivalent elements. The metadata can relate to equivalent elements.

Stage 250 includes mapping between names included in received metadataand technical information necessary to extract relevant content from anyinstance of similar pages on the website. It can also include creatingan XML representation of the feed to be created. The web feed datastructure can include information relating to web feeds generated byusers as well as information that associate between related web feeds.Web feeds can be related to each other if they associated with similarmetadata (especially similar semantic metadata). Similarity between webfeeds can be learnt from the identity of users that subscribed and/ordefined the web feeds. If certain users subscribed to certain web feedsthey can be associated to each other. Statistics relating to thesubscription to web feeds, timing between subscription to different webfeeds, identity of users that subscribed to different web feeds,unsubscribing from web feeds, and/or metadata that links the web feedsto each other can provide an indication about the association levelbetween different web feeds.

The web feed data structure or at least portion thereof (especiallysemantic information relating to web feeds) can be exposed to multipleusers. According to an embodiment of the invention the web feed canbecome readily available for users of the system. The individualcontributions of each user can construct a web feed data structure thatenables a significant (even full) coverage of the web in the form ofsemantic understanding and programmatic interaction with websites.

Conveniently, stage 250 is followed by stage 270 of creating theappropriate web feed format. The user can select the type of feedrequested and the system creates the web feed from the internal XML.

Conveniently, method 200 includes stage 290 of automatically creatingweb feeds and, additionally or alternatively modifying existing webfeeds. Stage 290 can include examining similarities between web pages,based upon the content of the web feed data structure. For example, ifafter running the standard key generating algorithms another web sitehas the same set of keys (or similar keys based on some heuristic), thesystem can use the same selections and semantic information provided bythe user for the original web site.

According to an embodiment of the invention, method 200 includes stage295 of searching web pages based upon semantic information (or othermeta-data) associated with these web feeds (e.g. “find all web pagesthat include recipes with less than 200 calories”.)

According to an embodiment of the invention, method 200 includes stage297 in which semantic understanding can be applied to advertisingnetworks or affiliate networks that have an API. Website owners can usethis semantic information to programmatically select between existingaffiliate network feeds to choose both the most appropriate merchant anddynamically display the most appropriate merchant products for each pageon their site. If the merchant or advertiser does not have anappropriate web feed, this invention can be used to easily create such aweb feed.

According to other embodiments of the invention, the system can be usedto better match and target advertisements, or merchandise to content.Website owners can place the system semantic powered advertising thatdirectly relates to an understanding of the content, as opposed totraditional methods of keyword matching (e.g. “provide a link to mysupermarket checkout with a pre-filled shopping cart whenever a list ofingredients exists.”)

After creating a web feed, the user may choose to edit the web feed orcreate a new web feed based on it. The process of editing a web feed isconveniently similar to the process of creating one, except for the factthat the user need not supply the pages to work on and the web feed ispre-defined.

FIG. 2 illustrates method 300 for distributing a web feed according toan embodiment of the invention.

Stage 310 starts by stage 310 of determining to initiate a web feeddistribution process or receiving a trigger that triggers a web feeddistribution process.

Stage 310 is followed by stage 320 of retrieving informationrepresentative of a web feed, wherein the information includes at leastone selected element.

Stage 320 is followed by stage 340 of searching for at least one newequivalent element; wherein a new equivalent is equivalent to the atleast one selected element and is not included in the web feed. Stage340 can include generating the newly retrieved web page and calculatingequivalency classes, in a manner than is analogues to stage 120.

Stage 340 is followed by stage 350. If the new keys generated for thewebsite are different, according to some metric, than the originalkeys - then the system will notify the user (either by email orotherwise) of the fact that their feed has degraded and may no longerwork properly as described in stage 350.

Stage 340 is followed by stage 360 of generating an updated web feedthat includes the at least one selected element and the at least one newequivalent element, if the at least one new equivalent element was foundduring stage 340.

Stage 360 is followed by stage 380 of syndicating the updated web feed.

FIG. 3 illustrates system 400 according to an embodiment of theinvention.

System 400 can include various software, firmware, middleware and/orhardware components. It is typically connected to users via one or morenetworks.

System 400 may represent practically any type of computer, computersystem or other programmable electronic device. System 400 may beconnected in a network or may be a stand-alone device in thealternative. System 400 can be connected to other devices via wiredand/or wireless links. It is noted that system 400 can be characterizedby a centralized architecture but that it can also be characterized by adistributed architecture. Accordingly, the various components of system400 can be located near each other, but this is not necessarily so.

FIG. 3 illustrates system 400 as including memory unit 410, andprocessor 420. Memory unit 410 is adapted to store informationrepresentative of a web feed, wherein the information includes at leastone selected element. Memory unit 410 can store the web feed datastructure or portions thereof.

Processor 420 is adapted to search for at least one new equivalentelement. A new equivalent is equivalent to at least one selected elementthat and is not included in the web feed. Processor 420 is also adaptedto generate an updated web feed that includes the at least one selectedelement and the at least one new equivalent element; and to syndicatethe updated web feed.

It is noted that system 400 can perform various stages of method 100and, additionally or alternatively, can perform various stages of method300.

According to an embodiment of the invention memory unit 410 is adaptedto store selection information representative of a selection of selectedelements out of multiple elements of a web content representation, andprocessor 420 is adapted to generate an equivalent indicationrepresentative of at least one equivalent element that is similar to theselected elements; wait for a user to elect at least one equivalentelement; and define a web feed that comprises the selected element andat least one equivalent element, if at least one equivalent elementexists.

FIG. 3 also illustrates various modules. These modules can be softwaremodules that are executed by processor 420 but this is not necessarilyso.

Equivalency engine 430 can calculate equivalency classes, can locateelements that are equivalent to selected elements, and the like.

Equivalency class modules A-C 442-446 are sample equivalency engines.Each includes information representative of mutually equivalentelements. These modules as well as additional modules (such as metadatamodule) can form a web feed data structure.

Runner module 450 scans web pages and sends elements of these web pagesto the equivalency engine.

Furthermore, the invention can take the form of a computer programproduct accessible from a computer-usable or computer-readable mediumproviding program code for use by or in connection with a computer orany instruction execution system. For the purposes of this description,a computer-usable or computer readable medium can be any apparatus thatcan contain, store, communicate, propagate, or transport the program foruse by or in connection with the instruction execution system,apparatus, or device.

During the definition of a web feed multiple screens can be displayed tothe user. FIG. 4 illustrates presented an illustrative sample screen 500that is presented to a user during the definition of the web feed,according to an embodiment of the invention.

Screen 500 includes various control icons 502, 505 and 506, web pagedisplay area 510, selected element area 520 and group area 530. It isnoted the screen is displayed after a user browses to the primarywebsite and enters a URL or other information that represents a web sitethat is displayed (or one of its pages is displayed). It is noted thatthe user can also select the format of the web feed.

Web page display area 510 is used to display web pages, either in theiroriginal format or including highlighted elements that can represent aselected element and, additionally or alternatively, one or moreequivalent elements.

Control icon “change similarity detection” 502 can be used to determinewhich equivalency algorithm is used and, additionally or alternatively,what is the equivalency level required to define two elements asequivalent items. Control icon “select inside” 506 allows a display of aportion of an element. Selected element area 520 is used to displayselected elements. Field area 525 is used to display the field namesgiven to selected elements. Group area 530 is used to display groups andthe elements included in the groups. It is noted that names or otherattributes of elements and groups can be displayed within areas 520 and530. It also contains an interactive feed preview area 550.

The illustrations of the embodiments described herein are intended toprovide a general understanding of the structure of the variousembodiments. The illustrations are not intended to serve as a completedescription of all of the elements and features of apparatus and systemsthat utilize the structures or methods described herein. Many otherembodiments may be apparent to those of skill in the art upon reviewingthe disclosure. Other embodiments may be utilized and derived from thedisclosure, such that structural and logical substitutions and changesmay be made without departing from the scope of the disclosure.Additionally, the illustrations are merely representational and may notbe drawn to scale. Certain proportions within the illustrations may beexaggerated, while other proportions may be minimized. Accordingly, thedisclosure and the figures are to be regarded as illustrative ratherthan restrictive.

One or more embodiments of the disclosure may be referred to herein,individually and/or collectively, by the term “invention” merely forconvenience and without intending to voluntarily limit the scope of thisapplication to any particular invention or inventive concept. Moreover,although specific embodiments have been illustrated and describedherein, it should be appreciated that any subsequent arrangementdesigned to achieve the same or similar purpose may be substituted forthe specific embodiments shown. This disclosure is intended to cover anyand all subsequent adaptations or variations of various embodiments.Combinations of the above embodiments, and other embodiments notspecifically described herein, will be apparent to those of skill in theart upon reviewing the description.

The Abstract of the Disclosure is provided with the understanding thatit will not be used to interpret or limit the scope or meaning of theclaims. In addition, in the foregoing Detailed Description, variousfeatures may be grouped together or described in a single embodiment forthe purpose of streamlining the disclosure. This disclosure is not to beinterpreted as reflecting an intention that the claimed embodimentsrequire more features than are expressly recited in each claim. Rather,as the following claims reflect, inventive subject matter may bedirected to less than all of the features of any of the disclosedembodiments. Thus, the following claims are incorporated into theDetailed Description, with each claim standing on its own as definingseparately claimed subject matter.

The above disclosed subject matter is to be considered illustrative, andnot restrictive, and the appended claims are intended to cover all suchmodifications, enhancements, and other embodiments, which fall withinthe true spirit and scope of the present invention. Thus, to the maximumextent allowed by law, the scope of the present invention is to bedetermined by the broadest permissible interpretation of the followingclaims and their equivalents, and shall not be restricted or limited bythe foregoing detailed description.

We claim:
 1. A system for dynamically defining a web feed, the systemcomprising: a memory unit adapted to store web feed data and to generatea web feed of selected web content; an input processor in operativecommunication with the memory unit configured to receive a user inputdefining one or more remote websites and to retrieve remote web contentfrom the one or more remote websites; a user interface configured todisplay a set of identified elements from the remote web content in adisplay area of a primary website; a selection processor in operativecommunication with the user interface configured to receive a userselection identifying one or more selected elements of the remote webcontent; an equivalency engine in operative communication with theselection processor and configured to calculate equivalency classescomprising subsets of the identified elements determined to bestructurally similar to the selected elements; and a web feed processorconfigured to generate a web feed for display to the user on the primarywebsite, wherein the web feed includes at least the selected elementsand one or more of the subsets of the identified elements determined tobe structurally similar to the selected elements.
 2. The system of claim1, wherein the input processor is further configured to generate a setof identified elements from the remote web content and, for eachrespective identified element, calculate at least one key comprisingdata describing structural characteristics of the respective identifiedelement.
 3. The system of claim 2, wherein the calculating equivalencyclasses comprises: comparing the key for a first identified element to asecond key for a second identified element to identify a structuralsimilarity between the first and second identified elements; anddetermining whether the first and second identified elements arestructurally equivalent.
 4. The system of claim 1, further comprising ametadata module configured to receive metadata representingrelationships between repeating instances of identified elements fromthe remote web content.
 5. The system of claim 1, wherein the selectionprocessor is further configured to update the display area of theprimary website to visually emphasize the identified elements from theremote web content that are determined to be structurally similar to theselected elements.
 6. The system of claim 5, wherein the selectionprocessor is further configured to receive a second user selectiondefining metadata or semantic information of the visually emphasizedelements.
 7. The system of claim 5, wherein the selection processor isfurther configured to receive a second user selection selecting one ormore identified elements that were not visually emphasized, orde-selecting one or more visually emphasized elements.
 8. The system ofclaim 7, wherein the calculated equivalency classes are updated based onthe user selection of the identified elements or de-selection of thevisually emphasized elements.
 9. The system of claim 1, wherein the webfeed processor is further configured to automatically search the one ormore remote websites, to identify additional structurally similarelements, and to generate an updated web feed including the additionalstructurally similar elements.
 10. A non-transitory computer readablestorage medium having stored therein data representing instructionsexecutable by a programmed processor for dynamically defining a webfeed, the storage medium comprising instructions operative for:receiving a sample set including one or more remote webpages; extractingcontent from the one or more remote webpages to produce a set ofidentified elements; displaying the set of identified elements in adisplay area of a primary website; determining structural similaritiesof the set of identified elements; assigning a plurality of associatedkeys to each identified element in the set of identified elements,wherein each associated key describes a structural characteristic of theidentified element; grouping in equivalence classes subsets of theidentified elements which are determined to be structurally similarbased at least on the associated keys; receiving a user selectionidentifying one or more selected elements from the set of identifiedelements displayed on the primary website; and generating a web feed fordisplay to the user in the display area of the primary website includingat least the one or more selected elements and the subset of identifiedelements determined to be structurally similar to the selected element.11. The storage medium of claim 10, wherein the web feed isautomatically updated and regenerated according to a predefinedschedule.
 12. The storage medium of claim 11, wherein the regeneratingcomprises automatically searching the one or more remote websites,identifying additional elements that are determined to be structurallyto the subset of identified elements in the web feed, and generating anupdated web feed including the additional elements determined to bestructurally similar.
 13. The storage medium of claim 10, furthercomprising instructions operative for updating the display area of theprimary website to visually emphasize the subset of the identifiedelements from the remote web content that are determined to bestructurally similar to the selected elements.
 14. The storage medium ofclaim 13, further comprising instructions operative for receiving a userselection defining metadata or semantic information of the visuallyemphasized elements.
 15. The storage medium of claim 14, furthercomprising instructions operative for generating a semanticunderstanding of the remote website domain based on user definedmetadata or semantic information.
 16. The storage medium of claim 10,further comprising instructions operative for determining at least onenew equivalent element that is structurally similar to at least oneidentified element in the subset of identified elements, andautomatically updating the web feed to display the new equivalentelement.
 17. A computer-implemented method using a processor fordynamically defining a web feed, the method comprising: displaying a setof identified elements extracted from one or more remote webpages in adisplay area of a primary website; assigning a plurality of associatedkeys to the each identified element in the set of identified elements,wherein each associated keyed describes a structural characteristic ofthe identified element; determining structural similarities of theidentified elements extracted from the one or more remote webpages;grouping in equivalency classes a subset of the identified elementswhich are determined to be structurally similar based at least on theassociated keys; receiving a user selection identifying one or moreselected elements from the set of identified elements displayed on theprimary website; and generating a web feed for display to the user inthe display area of the primary website including at least the subset ofidentified elements determined to be structurally similar to theselected elements.
 18. The computer-implemented method of claim 17,further comprising visually emphasizing the displayed identifiedelements that are determined to be structurally similar to the selectedelements.
 19. The computer-implemented method of claim 18, furthercomprising receiving a second user selection confirming that thevisually emphasized identified elements are structurally similar to theselected element.
 20. The computer-implemented method of claim 17,further comprising identifying additional elements from the one or moreremote web pages determined to be structurally similar to the selectedelements, and automatically updating the displayed web feed to includethe one or more additional elements.